Theoretical Computer Science 134 (1994) 329-363 329 


Elsevier 


Finite-memory automata* 


Michael Kaminski 


Department of Computer Science, The Hong Kong University of Science and Technology, 
Clear Water Bay, Kowloon, Hong Kong 


Nissim Francez 


Department of Computer Science, Technion — Israel Institute of Technology, Technion-city, 


Haifa, 32000, Israel 


Communicated by A.R. Meyer 
Received October 1993 


Abstract 


Kaminski, M., and N. Francez, Finite-memory automata, Theoretical Computer Science 134 (1994) 
329 - 363. 


A model of computation dealing with infinite alphabets is proposed. This model is based on replacing 
the equality test by substitution. It appears to be a natural generalization of the classical Rabin—Scott 
finite-state automata and possesses many of their closure and decision properties. Also, when 
restricted to finite alphabets the model is equivalent to finite-state automata. 


1. Introduction 


In this paper we introduce a model of computation dealing with infinite alphabets, 
a generalization of the classical Rabin-Scott finite-state automata [6]. In doing so, we 
are aiming towards a very restrictive model, capable of recognizing only the natural 
analog of regular languages over finite alphabets. In addition, we would like our 
model, and the class of languages recognizable by it, to enjoy as many of the 
properties of the family of finite automata and regular languages as possible. Thus we 
are interested in preserving in closure under Kleene operations and Boolean opera- 
tions as well as in having decidable the emptyness problem. We succeed in doing so, 
except for closure under complementation, which is achievable only by passing to 
a restricted model. 
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Clearly, our aim cannot be achieved by preserving finite-stateness in its strict 
meaning. On the other hand, allowing for arbitrary infinite-state automata is obvious- 
ly too powerful, as it renders every language recognizable, by having a state corres- 
ponding to each letter in the infinite alphabet, see [2]. Thus we need a very restricted 
memory structure of the automaton, so that it will not be able “to take advantage” of 
its memory capabilities beyond what is needed for our purposes. We also want to 
preserve the computability of the recognizable languages, so the model should not be 
powerful enough to do coding, or even counting. 

For example, the deterministic version of our automata should be similar, as an 
acceptor, to the programs defined below.’ A program input is a sequence of atomic 
symbols over an infinite alphabet £, and a program itself consists of a specification of 
a finite set of variables v;, i=1,2,...,r, and a finite sequence of commands of the 
following type. 

è v; =o, where oeX. This operation is the initialization of the variable 1;. 

e read(v;). This operation (substitution) assigns the value of the next input symbol to 
the variable v;. 

è type(v;). This operation prints the value of the variable v;. 

v; =v,. This operation assigns the value of v; to v. 

è if v;=v;, then go to k. This operation (equality test) passes the control to the kth 
operation in the sequence, if the value of v; is equal to the value of v;, otherwise it 
skips to the next command in the sequence. 

e halt. This command terminates the program. 

Thus by restricting the manipulative power of the automaton to copying and 
comparing only, without the ability to apply any modification functions, all the 
automaton is capable of doing is to “remember” some bounded number of previously 
read symbols. Therefore, recognizable languages will have the usual characteristics of 
regular languages.” Typically, it is able to detect the presence of a specific letter, or the 
appearance of simple patterns such as a repetition of the same letter, etc.. 

Roughly speaking, the basic idea behind our definition is to equip the automaton 
with a finite set of proper states (as in the classic model), in which the “real computa- 
tion” is done. In addition, the automaton is equipped with a finite set of registers, each 
capable of being empty,? or storing a symbol! from the infinite alphabet. We refer to 
these registers as “windows.” The restricted power of the automaton is obtained by 
highly restricting its transition relation.* Here, the novel idea is to replace the equality 
test (of the next input symbol to some element of the finite alphabet), which underlies 
the classic model, by some (extended form of) substitution. The latter employs both 
equality tests and copying. What the automaton does with the next input symbol is 


1 These programs are a version of data-independent programs introduced in [10]. 

? Obviously, for a finite alphabet X the above programs are equivalent to finite transducers, and, the 
programs without the command type are just acceptors equivalent to finite automata. 

3 For technical convenience, emptiness is represented by holding a special symbol # that is not contained 
in the infinite alphabet. 

4 Our model is nondeterministic, so the transition is not a function. 
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the following. If no window contains the input symbol, then it is copied into a specified 
window (depending on the state). Otherwise, the automaton “remembers” that the 
input symbol has been previously read, in which case the usual equality test applies. 
As it turns out, relating the next input symbol to many windows simultaneously 
leads to an easier development of the theory without strengthening the model, see 
Section 3. 

The idea of replacing equality test with substitution originates from [7], where it 
was applied to an infinite alphabet of a very specific structure, where the letters had 
the form r(x,, x;) and were interpreted as binary relation symbols. Words over this 
alphabet have a structure bearing a strong relationship to Data-log, a useful data-base 
query language ([19]). Other useful interpretations of infinite alphabets are not hard 
to imagine. Finite sequences of simple patterns occur in many contexts within 
computer science. Actions of concurrent processes, when concurrency and commun- 
ication are restricted to very simple patterns, are another possible interpretation of 
infinite alphabets. 

A beneficial by-product of our theory is an ability to represent in a more compact 
way some classical finite-state automata, in case the finite alphabet is so large to best 
treated as if it were infinite. Thus process identifiers for example (in computer 
networks) will usually be natural numbers. Any practical network will have some 
bound on these numbers. However, such bounds may be determined in some complic- 
ated, architecture-dependent way, and at a higher level of abstraction it is best to 
ignore these bounds and consider arbitrary natural numbers. 

An important facet of our theory is a certain indistinguishability view of the finite 
alphabet embedded in the modus operandi of the automaton. Languages are only 
unique up to automorphisms of the alphabet.” Thus the actual letters occurring in the 
input are of no real significance. Only the initial and repetition patterns matter. This 
follows from the nature of substitution. If a new letter (i.e., one not in any window) is 
copied and later successfully compared, any other new letter, having appeared in the 
same position, would cause the same transitions. This, in particular, implies that the 
usual pumping lemma does not hold in our model. 

One can try to extend our theory to other models of computation, such as automata 
on infinite words and pushdown automata. Here, we restricted the presentation to the 
(analog of the) regular case only. However, the results have nontrivial proofs and 
establish the basic techniques that might be applicable to such extensions. 

The paper is organized as follows. In the next section we define the model of 
computation and present some of its basic properties. Section 3 deals with various 
closure properties, and in Section 4 we consider the deterministic and two-way 
models. The last section contains a concluding discussion and a list of open problems 
which seem to us to be of interest. We conclude the paper with an appendix which 
contains a decidability result. All proofs in this paper are constructive. 


5 Actually, only automorphisms which keep fixed those symbols to which windows are initialized are 
considered. 
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2. Definitions and basic properties 


In this section we define the model of computation and show some of its basic 
properties. 

Let Z be an infinite alphabet and let # be a symbol not belonging to Z. 
An assignment is a word w,w2...w,e(2U{#})* such that if w,=w, and i¢j, 
then w;= #. That is, an assignment is a word over 2U{#} where each symbol 
from X appears at most one time. According to the informal description of the 
model presented in the introduction, assignments represent the contents of 
the registers of the automaton: the symbol in the ith register (window) is wj. 
If w;=#, then the ith window is empty. In our model we assume that a 
symbol cannot simultaneously appear in two or more windows. This restriction 
does not weaken the model, see Definition 2 and Theorem 2 in the next 
section. 

For a word w=w;w3 ... w,E(2 U { #})* we define the contents of w, denoted [w], by 
[w]={w;:i=1,2,...,r}, ie. [w] consists exactly of those symbols of SU {#} which 
appear in w. 


Definition 1. A finite-memory automaton is a system A = <S, qo, u, p, u, FY, where 

è Sis a finite set of states. 

è qoES is the initial state. 

© u=w,w,...w,e(2U{#})' is the initial assignment — registers’ initialization. 

è p:S>{1,2,...,r} is a partial function from S to {1,2,...,r} called the 
reassignment. The intuitive meaning of p is as follows. If A is in 
state s, p(s) is defined, and the input symbol appears in no window, then A 
“forgets” the contents of the p(s)th window and copies the input symbol into that 
window. 

@ „SES x {1,2,...,r}xS is the transition relation. The intuitive meaning of u is as 
follows. If A is in state s, the input symbol is equal to the contents of the kth 
window, and (s, k, t)eu, then A may enter state t. In addition, if the input symbol 
appears in no window and is placed into the kth window (k= p(s)), then in order to 
enter state t the transition relation must contain (s, k, t). That is, the reassignment is 
made prior to a transition. 

e FCS is the set of final states. 


The initial assignment of automaton A and its length are denoted by u4 and r4, 
respectively. That is, u=u4 and r=r4. 

Similar to the case of finite automata, A can be represented by its initial assignment 
and a directed graph whose nodes are states. There is an edge from s to t, if there exists 
k such that (s, k, they. Such edge is labeled k. Also, if for a node s the value of p is 
defined, then s is labeled p(s). For graph representation of finite-ememory automata, 
see Example 1 below. 
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An actual state of A is a state of S together with the contents of all windows. Thus 

A has infinitely many states? which are pairs (s, w), where se S and w is an assignment 

of length r. These are called the configurations of A. The set of all configurations of A is 

denoted by S°. The pair qô =(qo, u) is called the initial configuration, and the config- 
urations with the first component in F are called final configurations. The set of final 
configurations is denoted by F°. 

The transition relation u induces the following relation u° on S°x Xx $°. 

Let w=w,w2...w, and v=0,02...v,. Then ((s, w), o, (t, v))eu° if and only if the two 
following conditions are satisfied. 

e If c=w,e[w], then v=w and (s, k, they. 

e If o¢[w], then p(s) is defined, vp) =0, for each k# p(s), v, = wy, and (s, p(s), the u. 
Let ¢=0,0,...0, be a word over X. A run of A on ø consists of a sequence of 
configurations cy,¢1,...,¢, such that co =q8 and (c;_ 1, Gi cen’, i=1,2,...,n. 
We say that A accepts ce £*, if there exists a run Cg, C;,...,c, of A on ø such that 

cnEF*. The set of all words accepted by A is denoted by L(A) and is referred to as 

a quasi-regular set. 

Before considering some general properties of finite-memory automata, we present 
several examples which show the difference and similarity between the models over 
finite and infinite alphabets. 


Example 1. Consider a finite-ememory automaton A =<{qo, 4, f }, do, ##; p, u, (fh, 
where 
© p(40)=1, p(q)=p(f)=2; and 
@ £=((qo, l, qo), Go. 1, a), (4, 1A) 2, DAIL 2S)}- 

This automaton has the graph representation shown in Fig. 1. 

One can easily verify that the language L, = L(A) consists exactly of those words 
where some element of X appears twice or more: 


L,={0,02...0,: there exist 1 <i<j<n such that o;=0;}. 


The behavior of A on such words is as follows. Being the initial state, the automaton 
stores new input symbols in the first window. When reading a symbol that appears 
twice, A changes the state to q and, storing new input symbols in the second window, 
waits for the second appearance of the symbol stored in the first window. Then it 
enters the final state. For example, abcbdeL,, because b appears twice in that word. 
An accepting run of A on abcbd is (qo, ##), (qo, a#), (q, b#), (q, be), (f, be), ( f, bd). We 
shall see in the sequel that the complement of L, (with respect to Z*) is not 
quasi-regular. 


© This is the major difference between finite and finite-memory automata. In particular, since finite- 
memory automata have infinitely many actual states, the pumping lemma does not hold for quasi-regular 
languages, see Example 3. For the same reason the computation power of the deterministic and two-way 
models differs from that of the one-way nondeterministic one, see Section 4. 
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1,2 


2 


initialization 
Fig. 1. 


Example 2. Let 2’={01,02,...,0,} be an r-element subset of X and let 
A'=CS, qo, W’, F> be a finite automaton’ over X’. Consider a finite-memory automa- 
ton A=<S, qo, 4, p, 4, FD, where u=0,¢,...0,, the reassignment p is nowhere defined 
and (s,k,t)eu if and only if (s,o,,t)ey’. It immediately follows that L(A)= L(A’). 
Therefore every regular language is quasi-regular. 


The next example shows that the pumping lemma does not hold for quasi-regular 
languages. 


Example 3. Let A be a finite-memory automaton defined by the diagram of Fig. 2. 
Notice that the reassignment function is not defined for either of q, and q4. Therefore 
the automaton can leave q, if and only if the input symbol appears in the first window, 
and can leave q, if and only if the input symbol appears in the second window. 
Let n>1 and let t9,171,...,T2, be distinct elements of X. Consider a word 
O=06102...G4n42, Where 6,;=63=19, C4n=Cant2=T2n, and 02;=02;43=7; for 
i=1,2,...,2n—1. That, is ø is of the form 
E ahaa 


x ORK KK KK k kk k k k k 


a eE 


where brackets connect the same qs. A straightforward induction shows that 
COo, C15 +++5Cant2> where 


co =(qo, ##), Cy =(q1,T0#), 


* 


Cai+2=(2>T2iT2i+1)s Cai+3 = (43> T2;T2i+1) C4i+4 = (Ga, T2i+2T 2141), 
Cait 5 =(41,T2i+2T2:+1) 1=0,1,... 0—1, and Can+2=( f, Tan-1T2n), 


7 That is, S is a finite set of states, qọ€S is the initial state, x’ <S x X' x S is the transition relation, and 
FCS is the set of final states. The sequence of states so,5,,...,S, is an accepting run on A on the word 
O107...Gn, if S9=Go, SnEF, and (Si-1,6; Sen’, i=1,2,...,n. 
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initialization 


Fig. 2. 


is an accepting run on A on ø. Thus øe L(A). We claim that ø contains no nonempty 
pattern that may be pumped. To prove our claim, assume to the contrary, that 
o=xyz, where y=o)0;+,...6m—-1 is nonempty and for some j>1, xyize L(A). An 
inspection of the automaton’s diagram shows that every word accepted by A must be 
of the length 4k+2, k=0,1,... Therefore m—I is a positive multiple of 4, which 
implies m—I1>4. Then xyz contains the pattern 6—26m—10101+101+2- Let 
(So, Wo), (S1, W1), --. s (Sm+2,Wm+2) be a run of A on 6469... Om —206m— 1001+ 191+2- Since 
on each input symbol, from each configuration, A can make at most one move, and 
o and xy/z have the same prefixes of length m— 1, (Sm—1,Wm—1)=Cm-1- We distinguish 
between the cases of m—1=4i+2, m—1=4i4+ 3, m—1=4i4+4, and m—1=4i+5. 

Assume m—1=4i+2. Then (Sm-1,Wm-1}=Cm-1 =C4i+2 =(q2,T2iT2i+1) The au- 
tomaton can leave q, if and only if ¢,=1;, which, by the definition of ø, holds if and 
only if either /=4i=m—3 or !=4i+3=m. Either of the equalities leads to a contradic- 
tion, because m—1>4. 

Assume m—1=4i+3. Then (Sm-1,Wm-1)=Cm-1 =C4i+3 = (43, T2iT2i+1) implying 
(Sms Wm) =(q4,C1T21+1). The automaton can leave q4 if and only if 61+1 =12;4,. By the 
definition of ø, the equality holds if and only if either 1+1=4i+2=m-—2 or 
1+1=41+5=m+1. Since m—1>4, this is impossible. 

Assume m—1=4i+ 4. Then (s,,-1, Wm—1)=Cm—1 =Cai+4 =(q4,T2i+2T2i+1). The au- 
tomaton can leave q4 if and only if o,=72;,,, which is impossible. 

Assume m—1=4i+5. Then (Sm-1;Wm-1)=Cm-1 =C4i+5 =(q1,T2i+2T2i+1), imply- 
ing (Sm, Wm) =(42, T2i+201:). The automaton can leave q, if and only if o,=12;42, which 
is impossible. 
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Thus xy/z¢ L(A), which shows that L(A) does not satisfy the pumping lemma for 
regular languages. 

Since the restriction of the set of configurations to a finite alphabet is finite, we have 
the following result. 


Proposition 1. Let A=<S,qo,u,p,u,F> be a finite-memory automaton and let >’ be 
a finite subset of X. Then L(A) 2" is a regular language (over 2"). 


Proof. Consider a finite automaton A’=<S’,qo,u', F’> over X’ that is defined as 
follows. 
© S =S A(S x (Z'O [u]o {#}4). Since 2” is finite, S’ is finite as well. 
e qo FA (4o, u). 
@ =E XE xS). 
e F=F Aor. 

Let oe 2. It immediately follows from the construction above that each accepting 
run of A on ø is an accepting run of A’ on ø, and vice versa. Thus ee L(A) 0 X™* if and 
only if ge L(A’). 


Propositions 2 and 3 reflect the fact that a finite-memory automaton can only sense 
“new” input symbols, i.e., ones appearing in the contents of the most recent assign- 
ment. It cannot, however, distinguish between different “new” symbols. The above 
property of finite-memory automata is useful a tool for proving that a language is not 
quasi-regular, see Example 4 below.’ Note that an occurrence of an input symbol that 
belonged to a previous assignment and was “forgotten” in a reassignment is also 
“new.” We need the following auxiliary result. 


Lemma 1. Let A=<S,qo0,u,p,u,F> be a finite-memory automaton. Then for each 
automorphism 1:2-X we have  1(L(A))=L(Aiq,.tu))s where Aqa ww) = 
<S, qo, (u), Ps H, Fy’. 


Proof. We contend that (so, Wo), (S1, W1), -s (Sn, Wn} is a run of A on o if and only if 
(So, 1(Wo)), (S1, 1(w1)), --- (Sn, (W,)) is a run of Aq.) on (0). First we prove by 
induction on the length of ø, denoted by n, that if (So, Wo), (S1, W1), ..- 5 (Sn Wn) is a run of 
A on g, then (so, 1(Wo)), (S1, 1(W1)), --- (Sn, 1(Wn)) is a run of Aq, 4a) On UG). Obviously, 
this is true if ø is the empty word. Assume that the claim holds for all words of length n. 
Let (So, Wo), (S1, W1) <.<, (Sn, Wn), (Sn+1,Wn+1)) be a run of A on ao. We have to prove 
that (So, 1(Wo)), (S1, 1(W1))s --- > (Sn, (Wn)), (Sn+15 1(Wn)) is a run of Aani) ON 1(6)1(c). 
Since, by the induction hypothesis, (sg, 1(Wwọ)), ($1, 1(W1)), --- (Sn, 1(Wn}) is a run of 
Aig,,4u)) ON (0), it suffices to show that ((s,, (W,)), (0), (Sn+ 1, {Wn+1)) JEKS. 


8 This property is also used in the appendix for proving the decidability of an instance of the inclusion 
problem for quasi-regular sets. 
? Here we implicitly extended : to YU{#} by putting 1{#}=#. 
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If for some k=1,2,...,%, c=W,E[w,], then (Sn, k,S,.+1,)EH, and w,4;=w,. There- 
fore 1(c) appears in the kth position of 1(w,). Hence ((s,, 1(Wn)), (0), (Sn+ 1, Wasi ERS. 

If c€[w,], then (Sn, 0(S,), Sn+1)€ H, and w,4, is obtained from w, by replacing 
the content of the p(s,)th window with ø. Since 1(¢)¢[1(w,)], (Wa+1) is obtained from 
i(w,) by replacing the content of the p(s,)th window with i(c¢). Again, 
((Snst(Wn)), HF), (Sn+ 15 (Wn EM: 

Now let (So, 1(Wo)), (S1, (W1))s --- 5 (Sns (Wn) be a norm of Ag, yu) ON 1(0). By the “only 
if” part of the proof applied to the inverse automorphism 1~* we obtain that 
(qo, u), (S1, W1), --- , (Sn, Wn) is a run of A on ø, which completes the proof. 


Proposition 2 (Closure under automorphisms). Let A = <S, qo, u, p, 4, Fy be a finite- 
memory automaton. Then for each automorphism 1:2 that is an identity on [u] and 
each oe X*, oe L(A) if and only if (o)€ L(A). 


Proof. The result immediately follows from Lemma 1, because, in the conditions of 
Proposition 2, the automata Aq, (u) and A coincide. 


Proposition 3 (Indistinguishability property of finite-memory automata). Let A= 
CS, qo, u, p, p, F> be a finite-memory automaton. If xye L(A), then there exists a subset 
2’ S[x] such that the cardinality of X' does not exceed r4 and the following holds. For 
any o¢X' and any té[y] U2’, the word x(y(0 |1)) obtained from xy by the substitution of 
t for each occurrence of o in y belongs to L(A). 


Proof. Let x be a word of length i and let (so, Wo), (S1, W1), --- > (Sns Wn) be an accepting 
run of A on xy. Let 2’=[w;], o€[w;], and téLy] U2". 

In order to prove that x(y(o|t))e L(A), it suffices to show that y(o|t)E L(Ajs, 3), 
where Ais, w) = <S, Si, Wi, p, u, F). Let 1 be the automorphism of X that permutes o with 
t and leaves fixed all other symbols. Then y(o|t)=1(y), and the result follows from 
Proposition 2, because neither o nor t belongs to [w;]. 


Example 4. Consider a language L, that consists of all words whose last symbol is 
different from all others. That is, 


Ly = {6102 ... Gq: OG; An, I=1,2,...,n— 1}. 


We contend that L, is not quasi-regular. To prove our contention, assume to the 
contrary that L, is accepted by an r-window finite-ememory automaton A. Let 
X=6 10 ...6,0,+, and y=o,4 , where all o;s are distinct. Then xye L, (= L(A)). Let 
2’ be a subset of [x] provided by Proposition 3. Since the cardinality of 2’ does not 
exceed r, there exists an ie{1,2,...,r+1} such that o,¢2’. Since [x] n[y]=9, it 
follows that o;¢[y] U2". By Proposition 3, x(y(o,+2|0;))¢ L(A). However in the last 
word the symbol o; appears both in the ith and the last positions. This contradicts the 
assumption L(A)= L3. 
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Being unable to distinguish between different new input symbols, sometimes 
a finite-memory automaton cannot distinguish between a new symbol and a symbol 
stored in one of its windows. Such a situation occurs when a new input symbol 
replaces an “old” one. Then the behavior of the automaton is exactly like when it 
reads the symbol stored in the “reassignment” window. We illustrate this property of 
finite-memory automata by the following proposition. 


Proposition 4. If A accepts a word of length n, then it accepts a word of length n that 
contains at most r4 distinct symbols. 


Proof. Let g=o,02...0,E€L(A) and let R=(So, Wo), (S1, W1), --.,(Sn Wn) Where 
Wi=Wi 1 -Wir i=1,2,...,n, be a run of A and p. Let mp(o) be the minimal integer 
i such that o;A¢[w;-1] and wi-1,ps:-1)# #, if such an i exists, and be œ otherwise. 
That is, if mg(o)=i< œ, then a; is “new relatively to [w;_,]” and is placed into 
a nonempty window. 

It follows that if e contains more than r4 distinct symbols, then map(a)< œ. 
Therefore in order to prove the proposition it suffices to show that if mgR(o)}< œ for 
some accepting run R of A on p, then there exists a word o’ of length n and an 
accepting run R’ of A and o’ such that mg(o)< mp (0). 

So, let R be an accepting run of A on ø such that mp(o)=i< œ. Let ı be an 
automorphism of X such that 1(6;)=Wy—1,o(s,_,). (Wi-1,p(5,.))=91 and (o)=0 
for o46;,Wi-1,p1s,,). We contend that R’=(qo,u),(51, 1), -.->(Si-1, Wi-1) 
(Si, 1(W;)), .-- (Sn, 1(W,)) is an accepting run of A on o’=c, ...0;_11(0;...0,) such that 
mr(a)<mp(a'). 

Since R is a run of A on ø, in order to prove that R is an accepting run of A ona’ it 
suffices to show that (s;- 1, w;— 1), U(a;), (Si, 1(;))e u° and that (s;, 1(w,)), ...5(S,51(W,,)) is 
an accepting run of Ais, w) = CS, Sis (Wi), p, u, F) on t(o; 41 -.. Cn). Since c; is stored in 
the p(s;-,)th window, (s;-1,p(s;-1),5,)e€u, and the relation ((s;-,,;-1), (ci), 
(si, (w;))E uo follows from i(6;)=W;-1 pis;_-4)- Since (Si, Wi), ...,(Sa, Wn) is an accepting 
run of Aj, w) = LS, Si, Wis p, Hy F) ON G;41...,, the fact that (s;,1(W;)), <.. , (Sn, 1(Wn)) is 
an accepting run of Ais; wy) = <S, Si, Wi), p, H, F) on 1(G; 41 -.. On) immediately follows 
from the proof of Lemma 1. Finally, the inequality mp(o’)>mp(o) follows from 
1(0;)=Wi—1, pts,_,)» This completes the proof of Proposition 4. O 


Remark 1. It follows from the proof of Proposition 4 that if u has I empty windows, 
then A accepts a word of length n that contains at most / distinct symbols not 
belonging to [u]. 


Using the indistinguishability property of finite-memory automata given by Prop- 
osition 4, we can easily prove that the emptiness problem for finite-memory automata 


is decidable. 


Theorem 1. It is decidable whether a quasi-regular language is empty. 
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Proof. Let A be a finite-memory automaton and let 2” =[u4]U {o1,...,0,} be a sub- 
set of X of cardinality r4 such that [u4] A {o,,...,0;}=@. We contend that L(A) #9 if 
and only if L(A) o Z'* 40. The “if” part is immediate. Let L(A) #0. By Remark 1, there 
exists a subset Z” =[w]U{t,,...,t,} of E such that L(A) A Z”*#0. Let ı be an 
automorphism of X such that 1(¢;)=1;, «(t;)=0; for i=1,...,], and i(o)=o for 0 #6;, Ti. 
Then, 


L(A) a 2'* = L(A) O(2"*)=1(L(A)n 2"), 


where the second equality follows from Proposition 2. Since L(A) 2"* #0, 
L(A) X'* £0 as well. 

Now the decidability of the emptiness problem follows from the above contention 
and Proposition 1. 


Proposition 5. Quasi-regular sets are not closed under complementation. 


Proof. Let L, be the quasi-regular language from Example 1. Then L, — the comp- 
lement of L, to 2* consists of all words where each symbol appears at most one time. 
We contend that this language is not quasi-regular. Assume to the contrary that there 
exists a finite-ememory automaton 4’ such that L, = L(4'). Since X is infinite, there 
exists a word ge L(A’) of length ry +1. By Proposition 4, A’ accepts a word o’ of length 
ry +1 that contains at most ry distinct symbols. Therefore some symbol of X must 
appear in o’ more than one time, in contradiction with the assumption 
L =L(4'). O 


Remark 2. The proof of Proposition 5 can be easily extended to show that no 
machine which is able to remember only a fixed number of symbols accepts L. 
Therefore nonclosure under complementation might, in some sense, seem to be 
a “natural” property of families of languages containing L, and defined by simple 
machines over infinite alphabets. It should be mentioned, however, that finite-mem- 
ory automata have a natural deterministic analog, and the language accepted by 
deterministic automata are closed under complementation (since we can simply 
replace F by S— F). Thus the classes of languages accepted by deterministic and 
nondeterministic finite-memory automata are different. Deterministic finite-memory 
automata are considered in more detail in Section 4. 


3. Closure properties of quasi-regular languages 


In this section we consider closure properties of quasi-regular languages. The proofs 
are based on the standard construction adapted to a slightly modified version of 
finite-memory automata that allows the possibility of relating the next input symbol 
to many windows simultaneously. 


340 M. Kaminski, N. Francez 


Definition 2. A finite-ememory automaton with multiple assignment, or shortly M- 

automaton, over X is a system A = <S, qo, u, p, u, FX, where 

e S is a finite set of states. 

© goéS is the initial state. 

@ u=W,W2...w,E(2U{#})' is the initial M-assignment. (Notice that an assignment 
for an M-automaton can be any word over (£ o { # } Y, i.e., the pairwise distinctness 
condition is relaxed.) 

e p:S= {1> = 2— {Ø} is a function from S to the set of all nonempty subsets of 
{1, 2, ...,r}, called the M-reassignment. The intuitive meaning of p is as follows. If A is 
in state s, then it replaces the contents of the windows indexed by the elements of p(s) 
by the input symbol. 

@ USS x (2'h?-" — {9}) x § is the M-transition relation. The intuitive meaning of 
jis as follows. If A is in state s, P is the set of indices of the windows containing the 
input symbol (after the reassignment is made), and (s, P,t)eu, then A may enter 
state t. 

e FCS is the set of final states. 

An M-configuration of an M-automaton A is a pair (s,w), where seS and 
we(XU{#})". As in the case of finiteememory automata, the set of all M-configura- 
tions of A is denoted by S°, i.e., SS=S x (2 U{ #})”. The pair 4 =(qo, u) is called the 
initial M-configuration, and the elements F x (2 U{#$) are called final M-configura- 
tions. The set of final M-configurations is denoted by F°. 

The relation „u induces the following relation u° on S* x È x S°. 

Let w=w,w,...w, and v=v,v2...v,. Then ((s, w), 6, (t, v))eu* if and only if the 
following conditions are satisfied. 

è If k¢p(s), then vu, = wk. 

e If kep(s), then v =v. 

© (s, {Khon t} EH. 

The first two conditions state that the input symbol is placed into the windows 
whose indices belong to p(s), and the last condition, in particular, states that the 
reassignment is made before the automaton changes the current state. 

Let ¢=0,02...0, be a word over X. A run of A ono consists of a sequence of 
configurations Co, C1, --., C, such that cg=q and (c;_,0;,c) eu’, i=1,2,...,n. 

We say that A accepts ce 2*, if there exists a run co,C),...,C, of A on o such that 
C,EF®. 


Example 5. Consider an M-memory automaton 4“ =<{q0,4, f}, qo, ##, OM, uM, 
{f}>, where 
e 0" (do)= {1}, o(g)=e(f)= {2}: and 
© u™={(qo, {1}, q0), (Gos {1}, 0), (4, {1, 2} f) Gs (23, VG 1 2b K (25 I}. 
This automaton has the self-explanatory graph representation shown in Fig. 3. 
One can easily verify that L(A“)=L,, where L, is the language from Example 1. 
The behavior of A” on words where some element of © appears twice or more is 
similar to that of the automaton A from Example 1. Being in the initial state, 4” 
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{1} {2} {2}, {1,2} 


initialization 


Fig. 3. 


stores new input symbols in the first window. When reading a symbol that appears 
twice, A changes the state to q and, storing new input symbols in the second window, 
waits for the second appearance of the symbol stored in the first window. Then it 
enters the final state. For example, abcbde L(A™), and an accepting run of A™ on 
abchd is (qo, ##), (do, a#), (q, b#), (q, be), ( f, bb), (f, bd), compare with Example 1. 


Remark 3. Let A=<¢S, go, #, p, u, F> be an M-automaton. Introducing a new state, if 
necessary, we may assume that for each seS and each P&{1,2,...,r4} there exists te S 
such that (s, P, t)ey, i.e., A can always make the next move. Indeed, let q¢ S. Consider 
an M-automaton A’=<SU{q}.qo,4,p'.u,F> such that p’=pu{(s,P,q): 
{s} x {P} x Sap=O} VU {q} x (2! — {0} x {qh}; p'(s)=p(s) for seS, and 
p'(a)={1,2,...,r4}. Obviously, A’ can always make the next move. Also it cannot 
leave q, and can enter q if and only if A cannot make the next move. Thus 4 and A’ 
accept the same language. 


Theorem 2. A language is quasi-regular if and only if it is accepted by an M-automaton. 


Proof. We start with the proof of the “only if” part of the theorem. Let 
A=CS, qo, 4, p, u, F> be a finite-memory automaton with r windows. We construct an 
(r+ 1)-window M-automaton A™ that simulates A. In each stage of the computation 
r from (r+ 1) windows of A™ are in one-to-one correspondence with the windows of A, 
and the only window of A™ that is not in the range of that correspondence contains 
the input symbol. Thus after the reassignment is made, the input symbol appears 
either in one or in two windows. In the former case, the input symbol is a “new one” 
(relatively to the current contents of the windows), and in the latter case the input 
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symbol already appears in one of the windows of A. This fact will be used by A™ to 
simulate the behavior of A. In particular, if the input symbol is new, then the 
correspondence is changed so that the window containing the input symbol corres- 
ponds to the window of 4 containing that symbol (after the reassignment). Otherwise 
the correspondence remains unchanged. 
A formal description of A™ is as follows. Let I7,,, denote the group of all the 
permutations of {1,2,...,r+1}. Then AM =<S™, q% ,u™, p™, u”, F™ X, where 
e S“=SxII,,,. The meaning of a state (s, z) of A™ is as follows. If A™ is in the 
configuration ((s, 7), v1 02... V,+ 1), then A is in the configuration (s, U,¢1)0_(2) --- Ung). 
@ q% =(qo, Zia), Where Tia is the identity permutation, i.e., nia(k)=k, kK=1,2,...,r41. 
u“ =u#. 
e p“ =(s,x)={n(r+ 1)}, ie., the input symbol always is placed into the window that 
does not belong to the range of the permutation of the windows of A. 
e u“ ={((s,7) {nlk} xr+1)}, (Gm) (sk Dep o {((s, n), {rr + I}, (t, 256! 2)): 


(s, p(s), t)eu}, where mhg} is the transposition of p(s) and r+1. That is, 


To (PIS) =F + 1, They (r+ 1)= p(s), and 256) (k)=k for k # p(s), r+ 1. In accordance 

with the informal explanation preceding the definition of A™, the first operand in 

the union expression for u” corresponds to the case when the input symbol appears 
in the windows of A, and the second operand corresponds to the case when the 

symbol is new. In both the cases the transition of A” corresponds to that of A. 
eo FM=FxI1,,,. 

We contend that L(A)=L(A“). Let o=0,0...0,€L(A) and let (so, Wo), 
(S1, W1), (Sn, Wn), be an accepting run of A on ø. We transform it into an accepting 
run ((So, To), Vo); (S15 T1), V1), «++» ((Sns Zn), Yn) Of A” on o by induction as follows. Let 
((So; Zo), Vo) = (46, wu) and assume that 7; and v; have been defined so that v; nk) = Wi, k» 
k=1,2,...,r. (Here and hereafter we write w,;=w;,.W;,2...W;,, and 
0; =U;,1U;,2 -..U;,-+1-) AS usual, we distinguish between the cases of o;,,¢6[w;] and 
oi+ı [w]. 

Let o;i+1E[w;]. Then for k#r+1, vizint = Vink and vizi moe+=i+. Let 
Ti+1=7;. Since for some k=1,2,...,r,0;4;=Wi,,, it follows that (s;,k,5;4,Jeu. 
By the induction hypothesis, Vink) = Fi415 which implies that 
((5;, 7), {:(k), mir + 1)}, (5:41, 7:+1)) belongs to the first operand in the union expres- 
sion for u™. Thus (((s;, 1i), vi), Cit 15 (Si415 Ti+ 1) vi DEUS. 

Let o;4,¢[w;,]. Then (s;, p(s;), 5:4, )€u. The components of v;,, for k#r+1 are 
Diet = Vim ANd O41 agrt1)=Fi+1- We put mj+1=75)7%:- Therefore, by the 
induction hypothesis, for k¥ p(s;), r+ 1, vj41,2,, 09 = inde = Wik = Wi+ i,k and, by the 
definition of 74.1, Vi41,7,.,(p(s))= Fi +1 =Wi+ 1, plsi); Since oj 41 is a new symbol (relatively 
to [w;]), (S: o(s;), Si+1)€u. This together with {k},...,-c,.,={a(r+1)} implies that 
(si, Ti) {k Poia 20:1 (Si+1> Ti+1)) belongs to the second operand in the union expres- 
sion for p™. Thus (((s;, 73), vi), 6; +15 (Si+1s +1) ¥i+1))E uM, which proves the inclu- 
sion L(A)G L(A”). 

Conversely, let o=0,02...6,€L(A“) and let ((So, To), vo), ((S15 T1), v1), «ss 
(CSa Tn) tn) be an accepting run of A” on o. We contend that (so, Wo), 
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(S15 W1), -5 ÈSn Wn), Where Wi k=Vi n K=1,2,...,7, 1=0,1,...,n, is an accepting run 
of A on ø. In order to show that ((s;, wi), 01+ 1,(S8i+1, Wi+1))E u° we (as usual) distinguish 
between the cases of o,e[w;,] and o;¢[w]. 

Let o;é[w;]. Then for some k¥r+1, Ci=Wik=Vi nk- Since (((s;, 73), Vi) O41, 
(Si+ 1s +1), M4 1)EHM, (Si, i), {2A(k), milt + 1}, (Si+ 1s Ti+ 1E u“. Therefore 
(Sik, 8i41)€u, and 2;=7;4,. This implies w;+ı =w; which, in turn, implies 
((Si, Wi), Fi 1 (Si+ 1, Wit EM. 

Let o;A[w;]. Then, by the induction hypothesis, the only symbol equal to a; in v;+1 
IS Vitim Since (((Si, T), Vi), C1415 (Si+1, Ti+ 1), Vi+ DEus, ((s:, 7), {zi(r+ 1}, 
(si41,%:+1))€H™. Therefore, by the definition of p”, ni1 =n) and 
(Si p(S;), 8;41)€H. This implies w41,,=Wwi,, for kA p(s;), and w;+1, p(s) =o; Therefore 
(Si, Wi), Oi41(Si41, Wi+1))E HS. This completes the proof of the equality L(A)=L(A™). 

Now let A” =<S™, g¥,u™, p™, uM, F Y be an M-automaton with r windows. We 
construct an r-window finite-memory automaton A that stimulates A”. An M- 
assignment w=w,w2...w, of A” is represented by an assignment v~v,v,...v, of 
A such that [w]S [v], and a “partition” p,,p2,...,p, of {1,2,...,r} such that w,=v; 
for lep;. 

A formal description of A = <S, qo, u, p, p, F > is as follows. Let P, denote the set of 
all r-dimensional vectors p=(p;, pa, ... ppe eny such that oer Py={1,2,...,7} 
and pi^ p;=9 for i¢j. Then 
e S=S"xP,. 

e do=(qh,p™), where p“=(p¥, p¥,...,p@) is defined as follows. Let 
u“ =u¥,u¥,... u7, and [u™]= {u;, u2, ... up}, where użu; for i#j. Then, for 
kxr, pit = {k juzu, and for k>r’, př =0. 

© u=u;uz... Up #'~", where u;,U2,...,u, are as above. 

@ p(s, (P1, P2, --..P,)) is the minimal integer k for which p, S p™ (s). Since p™(s) #0, and 
either (p1, P2, ---, p,) has an empty component, or each of its component is a one- 
element set, p(s, (P1, P2,---,P,)) is always defined. 

e uw consists of all triples ((s,(pi, p2,..-,P,)), k, (S, (Phs P2,.--Pr))) such that 
pk=pku p™ (s), Py = Px —p™(s), for k’#k, and (s, pp, t)eu™. Thus u reflects u™ 
together with p™. 

e F=F" x P,. 

We contend that L(A)=L(A™). Let o=0,0,-:-0,€L(A™) and let (so, Wo), 
(S1, W1), +. (Sns Wn) be an accepting run of A” on ø. We transform it into an accepting 
run ((So, Po), Yo), (81, P1), v1) ---((Sa>Pn)> Yn) Of A on o by induction as follows. Let 
((So, Po), ¥o) = (Go, u) and assume that pj=(Pi, 1, Pi, 25 -++ Pir) ANA V= V; 1, Vi, 2, -66 Uir 
have been defined such that for each k=1,2,...,r and each lep; x, Wi, 1= 8; x Notice 
that, by the definition of go and u, this condition is satisfied for i=0. 

Let k be such that 0,4, =0;,,, if o;4,€[v;,], and k= p(s; pi), if o;.,¢[v;]. We define 
Pi+i bY Disi,n=Pi,nP™M(S) and Pisi w =P e —p*(s:) for kK #k, and v: by 
Vi+1,k=0i+1 ANd vigi, p = Viw for k’ Ak. Then, since ((s;, wi), 0:41, (Si+1, Wiz i) En, 
it follows from the definitions of w;,, and v;,, that for each k=1,2,...,r and each 
lE€Pi+i,k Wi+1,1=Vi+1,~ L€, the induction hypothesis is satisfied for i+1. Also 
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(Sis Pi+1, ks 5i+1)€H™, which, by the definition of u, implies ((s;, pi), k, (8:41, Pi+1))EL- 
Thus (((s;,p;),0;), O41, ((Si41,Pi+1), ¥i+1))EH®, Which proves the inclusion 
L(A“) L(A). 

Conversely, let o=o,02...6,€L(A™”) and let ((So, Po), vo) ((S1sPi)s ti) -> 
((Sn, Pn), ¥n) be an accepting run of A one. We contend that (so, Wo), (S1, W1) ->s (Sns, Wn), 
where w;,;=v;,,, for lep; is an accepting run of A™ on ø. By definition, wọ =u". We 
have to prove that for each i=0,1,...,n—1, ((Si, Wi), 0:41, (Si41, Wie1))eu™. Let 
Oiti =V Then ((s;,p;),k,(s;41,piei))eH. By the definition of p, 
Pi+1,k=DPi,k Up" (si). Di+ 1, k’ = Pi, k’ —p™(s;), for k' #k, and (si, pi+ 1, k» Sit 1)eu™. It fol- 
lows from the definition of w;’s that w;+ 1, 1=w;, ı for l¢ p% (s), and Wi+1,1=Vi+1,k=Ci+1 
for lep™(s;). Thus, by the definition of u™°, ((s;, Pi), Ci+1s(Si+1,Pi+1))Eu™”, which 
completes the proof of the theorem. O 


Corollary. Every quasi-regular language is accepted by a finite-memory automaton 
whose reassignment function is everywhere defined. 


Proof. Let L be a quasi-regular language. Then, by Theorem 2, L = L(A™) for some 
M-automaton A™. The finite-memory automaton A constructed from A™ in the proof 
of Theorem 2 satisfies the property required by the corollary. © 


Now, in view of Theorem 2, when dealing with closure properties we may consider 
M-automata. 


Theorem 3. The quasi-regular sets are closed under union, intersection, concatenation 
and Kleene star. 


Proof. Let 41=<S',q),u',p',u',F'> and A?=<S?,q3,u?, p°, u°, F? be M-auto- 
mata, where ut and ware words of length r, and r3, respectively. 

Closure under union. The proof is based on the standard product construction. By 
Remark 3, we may assume that A! and A? can always make the next move. Let 
AY =<S' x S? (qd, qå) u'u’, p*, u”, (F! x S?)U(S' x F*)> be a M-automaton such 
that 
e p*(s',s*)=p"(s')U{Kk+11 }xep%s2), Le., the M-assignment on the first r, windows is 

that of A‘ and the last rą windows is that of A?; and 
o u” ={((s',s?), Phu fk+r,: ke P?}, (t!,t?)): (s1, Pt, tt)eu!, and (s?, P?, t7)ey7}. 

The automaton A” simultaneously simulates A’ on the first rı symbols of the 
assignments and A* on the last r, symbols of the assignments. Thus 
L(A‘)UL(A2)=L(AY). 

Closure under intersection. We use the product construction one more time. Let 
A^ =<S! x S? (qh, q3), u'u’, p>, u”, F* x F?> be a M-automaton, where p* and py” 
are as above. 

Since 4^ simultaneously simulates A' on the first r, symbols of the assignments 
and A? on the last r3 symbols of the assignments, L(A!) L(A7)=L(A”%). 


1 
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Closure under concatemation. Renaming the states of A’, if necessary, we may 
assume that S! n S?=@. Let AL? =<S! o S?, q}, ulu’, p'?, w*, F?) be a M-automaton 
such that 
e pt ?(s)=p!(s), for seS', and p™?(s)={k +F; bkeps» for seS?. 
eta Op" UL", where p, a and x” are defined as follows. 

w ={(s, P,t): (s, {keP: k<r,},t)eu'}. This relation is supposed to simulate the 
oe of S on a prefix of the input. Thus only the first rı windows matter. 
u’ ={(s, P, t): (s, {k:k+r,eP},t)ey7}. This relation is supposed to simulate the 
Devaviey of A? on a suffix of the input. Thus only the first r, windows matter. 

u” =f, P,s): (qå, (k: k+rieP}, seu’, feF'}. This relation “connects” A’ to A? 
vaca passing from a prefix belonging to L(A’) to a suffix belonging to L(A?). Thus 

At? first simulates A! on the first rı windows, and then simulates A? on the last rz 

windows. 

It can be verified that L(A!)(L(A7)— Gia L(A!:2), where e denotes the empty 
word. Indeed, if (s4, w4), (s1, wi), ... (s1, w1,) is a run of A* on o, and (s3, w3), 
(s3, w2), ... (s2, W2) is a run on A? on oy then (s,w4wd), (si, Wis), - 
(st, wi w3), (s2, wi w3), .. (S25 Wa, w2) is a run of A}? on 6102. 

Conversely, let (so, ww), (S1, wiw?), ... , (Sp, Wiw2) be an accepting run of A1? on 

1... Gq. Then so =q}, and s e F?. Since passing from a state of A‘ to a state of A? is 
possible only by means of a transition from p”, for some i=0,1,...,n—1, s,eF}, 

(q, {k: w241.4= 0141), Si4 Jeu, and w? =u*. Therefore (so, wb), (S1, W1), +++ (Si WF) is 

an accepting run of A! ong, ... G; and (q8, wẹ), (Si+ 1, W741), -+ s (Sn W2) is an accepting 

run of A? on Gi+1 --- On. 

Now, if e¢L(A?), then L(A!)L(A?)=L(A!?). Otherwise, L(A')L(A?7)= 
L(A?) L(A’), and the result follows from the closure of quasi-regular languages 
under union. 

Closure under Kleene star. Let A=<S,qo,4,p,u,;F>, where w=u,...u,, be an 
M-automaton. Introducing a new initial state, if necessary, we may assume that 
HAS x 2th x {qo} =0, ie., A cannot enter qo from any of its states. Let uw’ =u‘, ... u, 
any M-assignment of length r. We intend to prove that a 2r-window M-automaton 

=<S x {0, 1}", (qo, 0”), uu’, p*, u*, {(Go,0")}>, where p* and u* are defined below, 
sp ar ))*. 

o p*(s,a)={k+r cep). That is, p*(s) is a shift of p(s) by r, implying that the first 
r ee of A, (which contain the initial M-assignment of A) remain unchanged 
during the computation. This property of p* is used to reset A, for processing the 
coming element of L(A). 

@ pt=p' Un", where pw’ and u” are defined as follows. 

w consists of the elements ((s, (a,,@2,...,4,)), P, (t, (b1, b2, ...,b,))) which satisfy the 

following conditions. If ke p(s), then b,=1, and if k¢ p(s), then b; =a,. Furthermore 

for P,={k: k<r, keP and b,=0}U{k: r+keP, and b,=1}, (s, Py, they. 

K” ={((s, a), P, (qo,.0")): ((s, a), p, (f, b)Eu’, (J b)EF x {0,17}. 

The elements of u” “reset” Ay after “accepting” a word from L(A), and the elements 

of œ simulate the behavior of A after resetting. Namely, a, = 1 (0) indicates whether 
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the kth symbol of the M-assignment of A has (not) been replaced, and, respectively, 

A, “consults” the second (first) half of its current assignment. That is, the M- 

assignment v; ...v, of A corresponding to the M-assignment w, ...w 2, of Ay is 

defined by v,=w,, if a,=0, and v,=w,4,, if a=1. In other words, 0,=Wy+axrs 
k=1,2,...,r. Notice, that by the definition of p*, always w; ... w,=u. 

We break the proof of the equality L(A))*=L(A,,) into several stages. The first 
Stage is to show that L(A) is a subset of L(A,). Let 6 =6162 ... C„E L(A), and let 
(So, Wo), (S1, W1), ---5(Sn—1, Wn-1) (Sn, Wn) be an accepting run of A on. We transform 
it into an accepting run of Ay on 4a, ((So, 8o), uvo), ((S1,@1), uvi), ..., 
((Sn—1+@n—1)s 40n—1), ((qo, 0°), uv,), where a;,,=1 and if and only if kel] ṣi p(s), 
i=0,1,...,n—1, by induction, as follows. 

Let vo =u’, and assume that v; have been defined such that v;,,=w;,x, if a;,,=1, and 
Vi, k= Uk, If a;,=0. We define v;4, by vj41,,=0141, if kep(s,), and visi k=Y;+1, if 
k€p(s;). It follows from the definition of the as that v;,, satisfies the induction 
hypothesis for i+ 1. Since v;+,,, is stored in the (k +r)th window of Ay, our definition 
of v;,, agrees with the M-reassignment imposed by p*(s;). Let P denote the set of all 
indices of the windows of uv;,, containing o;,,. Then, by the definition of v;+,, 


P={kiuw=oj4,}U(ttk: Vita = Wisi, kh) = F141 diti, k= 1} 
Ufrtk: up =oi41, i+ 1,4 =O}. 


It immediately follows from the definition of a;,, that if a;,,,,=0, then wj+1,,=U,. 
Therefore 


Pa. ={kik<r, keP and a;4,,,=O0}U {kirt+keP, and ai+1,4=1} 


= fk}wai ro S 


Since (s; Pa,,,.5i+1)€#, the membership (((S;, @;), uvi), 6:41, ((Si+1, 4:41), Udi+1))E ur 
for i<n—1 follows from the definition of px, and the membership 
(((Sn—154n-1),4Un—1); Tn, ((Go, 0"), uv,))Eu** follows from the definition of p”. 

The second stage of the proof is as follows. Let o =06102 ...6,€L(A,), n>0, and let 
((qo, 0"), uu’), ((S1, a1), Wd), ..-,((Sn—1, @n—1), WPa- 1), ((Go, 0"), uv,), be an accepting run 
of Ay on such that for i=1,2,...,2—1, si A qo. Let for i=0, 1, ...,n, M-assignments 
w; be defined by w; ,=v;,,, if a;,,=1, and W; k= üx, if a;,,=0. Let s,eF and a,€ {0, 1}" 
be such that (((Sn-1, @,—1), #U,—1)s Cn, ((Sn, Gn )UY,) € u (see the definition of the u” part of 
u*). We contend that (so, Wo), (S1, W1), ---s(Sn-1 Wn- 1) (Sn, Wn) is an accepting run of 
A on ø (implying ee L(A)). 

First we examine the relationship between w; and w;,,. If kep(s;), then, by the 
definition of p*, k+rep*(s;), and, by the definition of u*, ai+1, x= 1, and vi+1,k=0i+1. 
Thus ws 1c =0j41,4)=O141- If kép(s,), then k+rép*(s,), and, by the definition of u*, 
Qi41,4=4;,~- Thus Witi, k= Vit 1, (Uk) = Vi, kluk) =Wi,x- This shows that the M-assign- 
ments w; satisfy the requirements on the transition between the configurations of A. 
We proceed to show that ((s;, wi), 0:41, (Si+1, Wi+1))eu°. Let P denote the set of all 
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indices of the windows of A, containing o;4,, i€, P={k}y,-6,U {k +r}, Since 


Wis ac= Up if aj44,,.=0, and Witi, = Uira If aiti =l, 


+1,50 


Pa, ={k:k<r, keP and a;+,,,=O}U{kirt+tkeP, and a;41,.=1} 


> {Khwia=er 


Since ((s;,4@;), P,(Si41,@:+:))€u’, by definition, (s;, Pa,» S:1+1)€4, implying 
((Si, Wi), Ci+ 13 (Si+13 Wis EK. 

Now we are ready to prove the equality (L(A))* = L(A,,). Let oe L(A,). If o is the 
empty word, then, definitely, it belongs to L(A))*. Assume that ¢=o, ...o, is not the 
empty word, and let ((So, ao), uvo), ((S1, 41), uvi), «+ 5((Sn—154n—1), #Dn—1)5 ((Sns Qn), UVa) 
be an accepting run of Ay on a. Let Sn, Sni ---> Sn, be all appearances of qo in the 
sequence So,5;,...,5,. (In particular, n; =0 and n,,=n.) Then 64,41 --- On, ELAn), 
i=1,2,....m—1, and, by the second stage of the proof with u’=»,, 
Onj+1 ++» On,,,¢L(A). Thus o is a concatenation of words accepted by A. 

Next we prove by induction that any concatenation of the elements of L(A) is 
accepted by A,. The basis, that is the empty concatenation, is immediate. For the 
induction step we prove that if g=o,...0,€L(A,), and t=0,41 --. On4m€L(A), then 
ate L(A,). Let ((So, do), uvo), ((S1, 41), uvi), ~.. ,((Sn- 1, an- 1), UVa- 1), ((Sn, Qn), UVa) be an 
accepting run of A, on øo. Thus (Sn, @„)=(qo,0"). By the first stage of the proof 
with u’=v,, there is an accepting run ((qo,0"), uv,), ((Sn+1,an+1) Unti) -3 
(Sntms An+m) UVn+m) Of Ay, on t. Thus ((So, ao), #09), (S1, a1), #01), ..., ((Sns An), UVa), 
((Sn4154n41)>#0n+1)s---3((Sp+ mo an+m) 4Un+m) is an accepting run of A, on ot. 


The following examples show that quasi-regular languages are not closed under 
either homomorphims or inverse homomorphisms, because in order to accept the 
(inverse) homomorphic image of a quasi-regular language, a finite-memory automa- 
ton may need to “remember” infinitely many identities over the input alphabet. 


Example 6. Let 2={010),...}, 2’={t,T2,...}, and 1:2* 2 be a homomorphism 
defined by i(03;)=1(63;-;)=T2; and 1(63;-2)=t2;-1, i=1,2,... Let A be a finite- 
memory automaton over X defined by the diagram shown in Fig. 4. 
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Obviously, L(A) = {c;0;: i#j} and (L(A) = {t;7;: i#j} O {t22 i=1,2,... }. It has 
been shown later that :(L(4)) is not quasi-regular. Thus quasi-regular languages are 
not closed under homomorphisms. The fact that 1(L(A)) is not quasi-regular can be 
proved as follows. Assume to the contrary, that it is quasi-regular, and let A’ be 
a finite-memory automaton over 2’ such that L(A’) =1(L(A)). Let i be such that neither 
Tzi NOT T2;+, belong to [ux], and let 1’ be an automorphism of 2’ that permutes t3; 
with t,;4, and leaves all other symbols unchanged. Since 12;t2;€1(L(A)), by Proposi- 
tion 2, t2:41T2i4+1€4L(A))(={tit): iAJ} O {t221 i=1,2,...}). This contradiction 
completes the proof. 


Example 7. Let X, 2” and 1 be as in Example 6. Let 4’ be a finite-memory automaton 
over X’ defined by the diagram shown in Fig. 5. 

Obviously, L(A’)= {t,7;:i=1,2,...} and 17 '(L(4'))=() 21 {00i 03i -103i 03i 
o3;-1}. In order to prove that 17 '(L(A’)) is not quasi-regular, assume to the contrary, 
that 1 '(L(A’)) is quasi-regular, and let A be a finiteememory automaton over X such 
that L(A)=17 3(L(A’)). Let i be such that neither o3;-, nor o3;-, belong to [u4], 
and let 1’ be an automorphism of 2’ that permutes 13;_ with t3;-, and leaves all 
other symbols unchanged. Since o63;-103;€171(L(A‘)), by Proposition 2, 
O3i-203; €17 (L(A) (= UF, {00i 03:— 103; 03:03:-1}). This contradiction implies 
that :~1(L(A’)) is not quasi-regular. Thus quasi-regular languages are not closed 
under inverse homomorphisms. 


Remark 4. Actually, under a reasonably weak assumption it can be shown that any class 
L of languages over an infinite alphabet which is defined by a set of machines having 
a finite description is not closed under either homomorphisms or inverse homomor- 
phisms. First we observe that, since the set of machines having a finite description is 
countable, L is countable. We prove that Z is not closed under homomorphisms under 
the assumption that Y= {o,, 62, ... }eL. Since L is countable, there exists an infinite 
subset L={a;,,0;,,...} of X such that LL. Let 1:22 be defined by i(o;)=o;,. 
i=1,2,... Then (2)=L, which shows that L is not closed under homomorphisms. +° 


1°Tt follows from the proof that we can replace Y by any infinite subset. 
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Next we prove that L is not closed under inverse homomorphisms under the 
assumption that {a,}eL. Let L be as above, and let r: YZ be defined by 1(¢)=o,, if 
oéL; and i(¢)=o, otherwise. Then 1’~ '({o,})=Z, which shows that L is not closed 
under inverse homomorphisms.*? 

Obviously, the above proofs hold for quasi-regular languages as well. However, 
Examples 6 and 7 are of interest because they are constructive, and the homomor- 
phism in these examples is surjective and finite. That is, its range is all of 2’, and the 
number of sources of the elements of 2’ is bounded uniformly. Thus quasi-regular 
language are not closed even under (inverses of) finite homomorphisms. 

Finally we show that quasi-regular languages are not closed under reversing. 


Example 8. For a word o=0,63...¢,, the reversal oè of a is o? =c,0,- 1 ...0,, and for 
a language LE >*, the reversal L¥ of L is L? = {o®: øe L}. Consider a language L, that 
is defined by 


L3={0162...0,: 6,401, i=2,3,...,n}. 


That is, L} consists of all words whose first symbol is different from all other symbols. 
It could be readily seen that L, is accepted by the finite-memory automaton shown 
in Fig. 6. 

Indeed, being in the initial state qo, the automaton stores the first input symbol in 
the first window and changes the state to f. Since the automaton cannot leave f and 
cannot make a move from f on an input stored in the first window, the result follows. 

The reversal to L, language ZÈ consists of all the words whose last symbol is 
different from all others. That is, L} = L3, where L, is the language from Example 4. 


11 It follows from the proof that we can replace {o,} by any nontrivial subset of X, and replace c, and o2 
by symbols belonging and not belonging, respectively, to that subset. 
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As was shown in that example, L, is not quasi-regular. Thus quasi-regular sets are not 
closed under reversing. 


4. Deterministic and two-way finite-memory automata 


In this section we briefly discuss deterministic and two-way deterministic finite- 
memory automata and present several examples. Some questions which concern the 
above models and might be of interest are listed in the concluding section. As we 
mentioned earlier the computation power of these models differs from that of finite- 
memory automata. First we consider the deterministic one-way model, that is weaker 
than the nondeterministic one, see Remark 2. 


Definition 3. A finite-memory automaton A = <S, qo, 4, p, p, F > is called deterministic 
if p is everywhere defined, and for each seS and each k= 1, 2, ...,r4 there exists exactly 
one teS such that (s, k,t)eyu. That is, p is a function from S into {1,2,...,r4} and u can 
be thought of as a function from S x {1,2,...,74} into S. 


Definition 4. An M-automaton A = <S, qo, 4, p, u, F X is called deterministic if for each 
seS, and each nonempty subset W of {1,2,...,74} there exists exactly one teS such 
that (s, W, t)e u. That is, u can be thought of as a function from S x (211 =" — {}) 
into S. 

A routine examination of the constructions in Remark 3 and in the proof of 
Theorem 2 in the previous section shows that a language is accepted by a determinis- 
tic finite-memory automaton if and only if it is accepted by a deterministic M- 
automaton. (This equivalence provides an indirect indication for the robustness of the 
definition of the deterministic model of computation.) We cail the languages accepted 
by deterministic finite-ememory (or M-) automata “deterministic quasi-regular lan- 
guages.” By Remark 2, the languages accepted by deterministic finite-memory auto- 
mata are closed under complementation, and a straightforward analysis of the proof 
of the union and intersection parts of Theorem 3 shows that the languages accepted by 
deterministic M-automata are closed under union and intersection. Therefore the 
deterministic quasi-regular languages are closed under Boolean operations. However 
the deterministic quasi-regular languages are not closed under either of reversing, 
concatenation, and Kleene star. The nonclosure under these operation follows from 
the examples below. 


Example 9. Consider a deterministic finite-memory automaton that has the graph 
representation shown in Fig. 7. 

The automaton behavior is as follows. Being in the initial state, it stores the first 
input symbols in the first window and changes the state to q. After that, storing new 
input symbols in the second window, the automaton waits for the second appearance 
of the symbol stored in the first window. Then it enters the final state which is 
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impossible to leave. Thus the language L4 accepted by the above automaton consists 
exactly of those words where the first symbol appears twice or more: 


La ={0103 ... Op: there exist i=2,3,...,n such that 6;=0;}. 


We have L4=L3, where L, is the language from Example 8. Thus Z} =Z} = Z}. 
Were the language L} deterministic, by Remark 2, its complement Z} = Z} would also 
be deterministic, in contradiction with Example 8. Therefore deterministic quasi- 
regular languages are not closed under reversing. 

Example 10 below shows that deterministic quasi-regular languages are not closed 
under either Kleene star or concatenation. 


Example 10. Consider a deterministic finiteememory automaton that has the graph 
representation shown in Fig. 8. 

The automaton behavior is as follows. Being in the initial state, it stores the first 
input symbols in the first window and changes the state to q. The automaton can leave 
the state q (and enter the only final state f ) if and only if the input symbol is equal to 
that stored in the first window, i.e., to the first input symbol. Furthermore, the 
automaton can leave f (and enter a nonfinal state q) if and only if the input symbol is 
not equal to that stored in the first window. Thus the language Ls accepted by the 
above automaton consists exactly of the words of the length greater than 1 and with 
equality holding between the first and the last symbols. 


Ls ={0102 ... On: 01=0,,n> 1} 


We contend that L¥ is not a deterministic quasi-regular language. To prove our 
contention, assume to the contrary that L¥ = L(A), where A=<S, qo, u, p, pH, F > is 
a deterministic finite-memory automaton. Let 0;,02,...,0,,,, be pairwise distinct 
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elements of X. Then for each i=1,2,...,r4+1 the word 6101020103 ...016,,010,,, 6; 
belongs to Li, because both 6101020403 ...040;-101 and 
00 10j4 1010j42...010,010,4 0; belong to Ls. Since A is a deterministic finite-memory 
automaton, there is a unique configuration (s,w) that A can enter after reading 
0101020103 ...010,,0,0,,,. Then for each i=1,2,....74 +1, Ag w=<S,5,, p, u, FY 
must accept o;. Since Aj») has ry windows, there is an i=1,2,...,74+1 such that 
o,;¢[w]. Let t be a symbol different from any of o;’s and let 1 be the automorphism of 
Z that permutes t with c; and fixes all other symbols. By Proposition 2, A(s, w) accepts 
t. Therefore A accepts the word 0,0,020,063...0,6,0,0,+,t, which is impossible, 
because no suffix of that word belongs to L;. This shows that deterministic quasi- 
regular languages are not closed under Kleene star. Notice that it follows from the 
above proof that the language L;L; is not deterministic either. Thus deterministic 
quasi-regular language are not closed under concatenation. 


Finally we consider two-way deterministic finite-memory automata. Our definition 
is basically that of a two-way deterministic finite automata, see [8]’7, relativized to 
the case of finite-memory automata. 


Definition 5. A two-way deterministic finite-memory automaton is a system 
A=CS, qo, u, p, u, F >, where S, go, u, p and F are as in a deterministic finite-memory 
automaton, and the transition function u maps S x {1,2,...,r} into Sx {— 1,1}. 


The meaning of n is as follows. If (s, k)=(t, — 1), then in state s, scanning the input 
symbol stored in the kth window, the automaton enters state t and moves left. If 


uls, k)=(t, 1) then in state s, scanning the input symbol stored in the kth window, the 


12 See also [1, pp. 36-42]. 
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automaton enters state t and moves right. The first and the second components of 
uls, k) are denoted by u(s, k) and u3(s, k), respectively. That is, u, : S x {1,2,...,r}S, 
H2:S x {1,2,...,r} >{-1, 1}, and p=(1, H2). 

A configuration of A is a pair (s, w), where seS, and w is an assignment of length r. 
The transition function p induces the function p°:S° x E—S° x {—1, 1} that is defined 
as follows. Let c=(s, w). 

è Ifo is the kth symbol of w, then p°(c,o)=((441(s, k), w), H2(s, k)). 
e Ifc¢[w], then p°(c, o)=((u (s, p(s)), v), H2(S, e(S))), Where v results from w by replac- 

ing the p(s)th symbol of w with o. 

The first and the second components of p°(s, k) are denoted by p{(s, k) and y5(s, k), 
respectively. 

As in the case of a two-way deterministic finite automaton, the future behavior of 
a two-way deterministic finite-ememory automaton on a given input depends only on 
the automaton configuration and the head position. The formal definition of this 
combination is as follows. 


Definition 6. Let A be as above. An instantaneous description of A on the input word 
ao is a pair (c, i), where ce S° and i is a positive integer not exceeding |o|+ 1, where |ø | 
denotes the length of o. 


The instantaneous description (c, i) is intended to represent the facts that c is the 
current automaton configuration during the computation on the input ø, and the 
automaton head is scanning the ith symbol of ø, if i<|o|, and the head has fallen off 
the right end of a, if i=|e|+1. 

Next we define the successor relation 4, on the set of instantaneous descriptions 
of A on a. Let ¢=0,0)...6,. If i=1 and u$(c)=— 1, or i=n+1, then (c,i) has no 
successor. Otherwise (c, i)F-4,,(c’,i’) if and only if c’ = p5 (c, o;) and i =i+ y3(c, oi). The 
requirement i> 1 for u$(c, ¢;)= —1 prevents any action in the event that the automaton 
head would move off the left end of the input, and the requirement i<n prevents any 
action in the event that the automaton head would move off the right end of the input. 

Let H o be the transitive and reflexive closure of ty... We say that A accepts ceL*, 
if for some final configuration f°eF*, (q6, Ya. (f°, |o|+ 1). That is, ø is accepted by 
A if, starting in the state qo and the head on the first symbol of ø, A eventually enters 
a final state at the same time it falls off the right end of the input. As usual, the set of all 
words accepted by A is denoted by L(A). 


Example 11. It has been shown in Proposition 5 that the complement of the language 
L, presented in Example ! is not quasi-regular. In this example we show that L, 
is accepted by a (three-window) two-way deterministic finite-memory automaton. 
(Recall that L, consists of all words where each symbol appears at most one time.) The 
reason for L, to be accepted by a two-way deterministic finite-memory automaton is 
that if the input belongs to L4, then it is possible to return to the ith position of the 
input by remembering g; is one of the windows. Before describing formally an 
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automaton that accepts L,, we give a general idea lying behind the proof. Observe 
that o,02...0;€L, if and only if for each i=2,3,...,n, 6,02 ...0;EL,. Now, given an 
input ¢=0,¢62...0,, Our automaton first stores g, in the first window and then for 
each i=2, 3,...,n verifies whether ¢,0, ...¢;€L,. For such verification the automaton 
performs the following sequence of moves. After “accepting” o,¢2...0;_,, the au- 
tomaton checks whether o;=0,. If the equality holds, then ¢¢L, and the automaton 
enters a “dead state,” i.e., a nonfinal state which is impossible to leave. If ¢;40,, the 
automaton stores g; in the second window and starts moving left from c; towards o, 
trying to find out whether there is an j=2,3,...,i—1 such that o;=0;. (In this 
sequence of moves the automaton stores new input symbols in the third window.) If 
such j exists, then øL, and the automaton enters a dead state. Otherwise the 
automaton will eventually reach o,. Since the automaton already “knows” from the 
previous verification that c16 ... c;-ı € L1, arriving to ø; indicates that it is at the left 
end of o and o,0,...0;€L,. After arriving at the left end of the input, the automaton 
turns right and moves to o;. From a; it moves right, enters a final state, and repeats the 
same procedure starting from o;,,, etc. 

The formal description of a two-way deterministic finite-memory automaton 
A such that Z,=L(A) is as follows. Let A=<S,qo.u,p,,F>, where 
S={q0,41:42,q3;f }, U= ###, P(do)=1, p(41)= p(G2)=P(q3)=3, p(f)=2, F={ fF}, 
and the transition function yu is defined by the Table 1. 


Table 1 

TC 

1 2 3 

| 
do | fl qı, 1 qi, | 
qı qı, 1 qı, 1 41,1 
q2 | 43, ! qı, 1 q2, —1 
93 |q’ 1 fl q3, | 
f | |aiel q2,-! q,l 


A visualized graph representation of A is given by the diagram of Fig. 9. In this 
diagram the superscript of label k of an edge exiting state s is defined as follows. If 
Hals, k)= — 1, then the superscript is L, and if u(s,k)=1, then the superscript is R. 
That is, the superscript indicates the direction of the automaton movement. 

Let ø be a word over X*. We prove by induction on |ø] that if ee L4, then A accepts 
o, and if o¢Z,, then A running on o must enter the dead state qı. The case of 
|o|=1, 2,3 can be verified by diagram chasing. In particular, it can be easily seen that 
if o,0,036L,, then ((qo, ###), 1) 14,020, (Cf 010302), 4). The reason for considering 
the case of n=1, 2,3 separately is to avoid the possibility of cp- ; =o,, see the proof of 
the induction step below. 

For the induction step, let n > 3 and assum that for each 0,02 ... On if 6,02... oye Lis 
then ((qo, ###), 1) F4.c,01...0, (h 19nn-1), n+1), and if 0102 .-- Onf Li, then 
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A running on 6,06,...¢, eventually enters qı. Let o=0,0,...6,4,. First 
assume that øeL,. Then o,02...¢,¢L;, and, by the induction hypothesis, 
(qo. ###., D4 o0,...c, Cf F10nn-1), n+ 1). Since A is a deterministic automaton, 
it follows that also (qo, ###, Ik 4e (f, O10n0n-1), +1). Therefore, in order to 
prove the induction step for the case of ae L, it suffices to show that ((f, 6;6,0,—1), 
n+l), Fï o (Lf 0101419"), n+2). Since o,+,¢[010,0,-1], being in state f with the 
head scanning o,,,,, the automaton must store it in the second window. Then it has to 
move left and enter state qo: ((f, 010nOn-1), M+1) Hae ((G2,010n410n-1), n). Since 
On€LOnOn+10,—-1], the automaton, being in state q2, must store a, in the third window 
and move left, staying in q,. After that, in the same manner, A staying in q} must 
continue to move left (storing each symbol it scans in the third window) until it arrives 
to a,: 


((q2, O10n+ 10n-1) n) Fae ((G2,010n+19n)s n—1) 
H4, o ((d2,010n+10n-1) n— 2) 


Fag HFa, o((G2,010n41%2), 1). 
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Then the automaton turns right: 
((43,910n+102),1) Fae ((43,016n+102), 2) 
Fao ((43,016n4+103),3) Fag 


Since each symbol A scans differs from its window content, it will arrive to o,41, 
and, finally, 


((q3; O16n410n),n+ 1) Fao (S O10n+10n) n+2). 


Now assume that o¢Z,. If o,02...6,¢L,, then, by the induction hypothesis, 
A enters the dead state before arriving at o,,,,. Otherwise, by the induction hypothe- 
sis, ((Go, ###), 1) F 4.6 (J. 010,0,-1), n+1). From the instantaneous description 
((f, 616,0,-1), n+1) the computation of A is as follows. Since ¢,0,...0,€L, and 
0162... G,0n+1¢L1, for some i=1,2,...,n, On+1=0;. If i=1,n, then the automaton 
immediately enters the dead state. Otherwise, exactly as in the proof of the case of 
o¢L, it moves left towards o, being in state q2. When A arrives to Gi, it enters the 
dead state, because c; is stored in the second window. This completes the proof of the 
induction step. 


5. Concluding remarks and disucssion of future research 


In this paper we proposed an extension of the notion of finite automata to 
finite-memory automata whose inputs are words over finite alphabets and established 
some basic properties of languages accepted by such automata. It was shown that 
these languages possess many of the closure and decision properties of ordinary 
regular languages. Also their restrictions to finite alphabets are regular. Even though 
finite-memory automata seem to be a quite reasonable model of computation over 
infinite alphabets, there are no (and cannot be any) formal criteria for accepting them 
(or any other model) as a natural extension of finite automata. Therefore only intuitive 
arguments and “field tests” can be used to support or reject our definition. Based on 
some similarity with finite automata, we argued that our model is the “right one.” 
However we do not exclude the possibility that a deeper investigation of languages 
over infinite alphabets will lead to another more appropriate model. In any case we 
believe that any simple model of computation over infinite alphabets must be very 
close to finite-memory automata. 

We conclude the paper with some problems which, on one hand, are of interest for 
their own right, and, on the other hand, might give a better insight into simple 
languages over infinite alphabets. 

e In our opinion, the major problem left unresolved in this paper is whether the 
containment of quasi-regular languages is decidable. (In [3] we claimed that the 
inclusion problem L” S L' is decidable for any two quasi-regular languages L’ and 
L”. However the proof there is not correct, and we are able to solve the problem 
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only for the case where L’ is accepted by a two-window finite-memory automaton, 
see the appendix.!*) A positive answer would imply decidability of the emptiness 
problem for the languages definable by systems of finite-memory automata, see [3]. 
These languages are interesting for the following reasons. First, they are exactly the 
boolean closure of the set of the quasi-regular languages. (Thus, in particular, they 
are closed under Boolean operations.) Second systems of finite-memory automata 
are analogous to Rabin’s automata on infinite words ([6]), and having the decida- 
bility of emptiness problem for the languages definable by systems of finite-memory 
automata, it is very likely that the results of this paper can be extended to 
“w-quasi-regular languages.” 

e The second problem concerns the relationship of deterministic and nondeterminis- 
tic quasi-regular languages: does each quasi-regular language belong to the closure 
of deterministic quasi-regular languages under union, intersection, concatenation, 
and Kleene star? 

è The next problem deals with the languages accepted by two-way deterministic 
finite-memory automata. What closure properties do they possess? Are the empti- 
ness and containment problems for these languages decidable? Is each quasi- 
regular language accepted by a two-way deterministic finite-ememory automaton? 

e Finally, how can finite-memory automata be extended to pushdown automata over 
infinite alphabets? There are two possibilities. The first one is to consider pushdown 
automata with finite stack alphabets which are able to store input symbols only in 
their finite memory. The second possibility is to allow storing input symbols both in 
finite-memory and in the stack. What is the relationship between these two types of 
pushdown automata? Is it possible to extend the definition of a context-free 
grammar to infinite alphabets in such way that “quasi-context-free languages” are 
exactly those accepted by extended pushdown automata? 


Appendix A. A decidability result 


Here we prove that for a two-window finite-memory automaton A’ and for 
a finite-ememory automaton A” it is decidable whether L(A”)< L(A’). The decision 
algorithm is as follows. Let A’=<S',qo,u', p', uw’, F’> and A” = <S", qo, u", p", u", F” > be 
a 2- and an r-window finite-memory automaton, respectively. Assume for a moment 
that we are able to compute a positive integer N (that depends on A’ and A”) such that 
if L(A") ¢ L(A’), then the difference L(A”)— L(A’) contains a word ø of length not 
exceeding N. Then we can proceed as follows. 

Let t,,T2,...,Ty be pairwise distinct symbols not belonging to [u'] O [w’’], and let 
X'= {t }i=1,..., n U Le] V [w’]. Since o contains at most N distinct symbols, there exists 
an automorphism 1 of £ that is an identity on [w] U[u’] such that 1(¢) is a word over 


13 Even the problem of deciding whether a quasi-regular language is the whole Z* seems to us to be very 
difficult. 
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the alphabet Z’. By Proposition 2 (that establishes the closure under automorphisms), 
u(a)EL(A")—L(A’). Therefore it suffices to check the emptiness of 
(L(A")— L(A’)) V2”. Since 2” is a finite alphabet (its cardinality is at most N +r +2), 
by Proposition 1, the languages (L(A’)2’*) and (L(A”)) a Z'*) are regular. Now 
the decidability result follows from the equality (L(A”)—L(A))nl*= 
(L(A") A 2’*)—(L(4‘)) 0 2"*), the closure of regular languages under boolean opera- 
tions, and the decidability of the emptiness problem for regular languages. To 
compute the above constant N we need some preliminary results (four definitions and 
five lemmas). Below A’ and A” are as in the beginning of the appendix. 

With the transition relation x we associate a function from 25" x Z* into 25", which 
we also denote by x’. (Recall that S° denotes the set of all the configurations of A’.) The 
function y’ is defined by y'(C,e)=C and p'(C,o0)= Ucey(c.a {Cc (c,0,¢ eu}. For 
example, y'({q¢ },@) is the set of all the configurations A can enter after reading a. 
Thus L(A’)={o: u ({q8}, 6) 0 F° 40}. This, in turn, implies that L(A”)¢ L(A’) if and 
only if w'({q¢}, 6) F°=0}, for some oe L(A”). 

This extension of x’ is motivated by the construction of a deterministic finite 
automaton from a non-deterministic automaton, see [1, Theorem 2.1, pp. 22-23], and 
will play a similar role. However, in our case, the range of y’, in general, is infinite. This 
is the reason that some nondeterministic finite-memory automata cannot be con- 
verted into deterministic, see Proposition 5 in Section 2. 

Next we are going to adapt the classical product construction, see [1, pp. 59—60] to 
the above “deterministic version” of A’ and A”. Recall that we are interested in the 
language L(A”) L(A’). First we observe that if (so, Wo), (S1, W1), -< (Sn, Wn) is a run of 
a finite-memory automaton on a word o,...0,, then o;€[w;], i=1,2,...,n. This 
invariant property of a run motivates the following two definitions. 


Definition A.1. Let te X. A configuration (s, w) of a finite-memory automaton is called 
a t-configuration if te[w]. 


Definition A.2. An (4', A”)-configuration is a triple (t, C, c), where te, C is a finite set 
of t-configurations of A’, and c is a t-configuration of A”. 


Now we can define a run of the “product” of A’ and A”. 


Definition A.3. Let c=o,...0,¢€2*. An (A’, A”)-run on o is a sequence of (A’, A”)- 
configurations C,,C2,...,€,, C;=(o;,Cj;,¢;), such that C;,,=(C;,0;+,), and 


ne 


(Ci, 0i415,Ci4i1)€n', i=0, 1,...,n-1, where Cy=ta5}; Co=do. 


Our decision algorithm is based on an analysis of (A4’, A’”)-configurations which are 
the states of the “automaton combined from the complement of A’ and A”.” First we 
establish an invariant of (4’, A”)-configurations (under automorphisms of X). It will be 
used for computing a positive integer N defined in the first paragraph of the appendix. 
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Let C be a finite set of t-configurations of A’ and let Zc be a subset of TU{#} 
defined by X= {a: (s, to)EC, or (s, ot)EC}. Consider a relation =, on Xc such that c; 
=, 6, if and only if the following holds. For each seS’, (s,ta,)eC if and only if 
(s, to2)EC, and (s,o,t)eEC if and only if (s,o.7)eEC. It immediately follows from the 
definition of =, that it is an equivalence relation. The equivalence classes of =, can be 
described as follows. For each nonempty subset P of S’ x {1,2} define a subset C? of 
Xc by 


C’={aedc: (s,to)EC if and only if (s, 2)e P, and 
(s,ot)eC if and only if (s, lhe P}. 


Then the equivalence classes of =, are in one-to-one correspondence with those 
subsets P of S’ x {1,2} for which C? is nonempty. 


Let S”={s,,53,...,5,,}. Let L be the cardinality of 25 * {1.2} _ {0}, and let 
{P,,P2,...,P,} be an enumeration of 2° * 12 — {0}. With each (4’, A”)-configuration 
C=(t,C,(SmsW1,---5W,)) We associate an (2r+2+L)-dimensional integer vector 
Ve=(ny,no,...,23,424 1) that is defined below. 

e For i=1,2,...,2r the component n; of Ve is defined as follows. If w;= #, then 
nzi-1 =n = L+3. If wj=t, then n2,-,;=L+2 and nz,=L+4. If wi¢2eu{ #,t}, 
then nj;-;=L+1 and n3;=L+5. Otherwise, for some j=1,2,...,L, w€ C”, and 
we put n2;-,=L—j and n;=L+5+j. 

@ 72,41, =!,—m, and n2,,2=1,+m. (Recall that the state part of the third component 
of Č is Sm.) 

e For i=2r+3,...,2r+2+L, n; is equal to the cardinality of C”. 

The notion of the associated vector reflects the structure of a configuration up to an 
automorphism of 2. Namely, let 1 be an automorphism of 2. W can extend 1 to 
configurations by 1(s,w)=(s,i(w)),’* and then to (4’,A”)-configurations by 
u(t, C,c) = (u(t), {(c)heec, uUc)). Then we have the following indistinguishability 
result. 


Lemma A.1. In the above notation, Ve=V,@. 


Proof. It immediately follows from the definition of the extension of 1 to configura- 
tions, that i(C) and i(c) are a set of 1(t)-configurations and a 1(t)-configuration, 
respectively. Therefore for any ae€2¢, (s, to)eC if and only if (s,1(t)1(e))e1(C) and 
(s,ot)eC if and only if (s, (a)1(t))ex(C). The last equivalence implies that for each 
PSS’ x {1,2}— {0}, (C”)=((C))’. Thus the vectors Ve and Vie have the same last 
L components. The equality of the first 2r +2 components of the vectors Ve and 
Ve) immediately follows from the fact that the first component of 1(C) is 1(t), and the 
equalities (#)= # and (C?)=((C)), j=1,2,..., L, established above. O 


14 Here, as in Lemma 1, we implicitly extend 1 to 2U{#} by putting 1{ #}=#. 
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Now we introduce a partial order < on the set of the associated vectors. Let 
Vi =(y,1,.--5M1, 274242) and Vy =(Mo,1,---,M2,274+24L): Then Vig V: is and only if 
ni iXSna i i=1,2,...,2r+2+L. 


Remark A.l. Let Cy=(t;,C1,¢;), C1 =(S1,W1,1 ---W1,,) and C2=(1t2,C3,C2), C2= 
(s?,W2, 1... W2,,) be (4’, A”)-configurations such that Ve, < Vc,. Then w; x= # if and 
only if w2,4= #, Wi, =T; if and only if w2,4=12, W1,4€2c,U{#,11} if and only if 
w2, kZe, U {#, t2}, and w;,,€C}! if and only if w2,,¢C%’. Moreover, s'=s?. These 
equalities easily follow from the “double inequality data representation” in the first 
2r +2 components of the associated vectors. Actually, the above equalities are all that 
we need from the definition of <. However, the double inequality representation is 
more convenient for a citation of a result from the literature needed for the proof of 
Lemma A.6 at the end of the appendix. 


The nonemptiness of the difference L(4”)— L(A’) is tightly related to the notion of 
separability defined below. 


Definition A.4. Let C € S° be a set of configurations of A’, and ce S"? be a configura- 
tion of A”. Let oe £* be a word over X. We say that the pair (C, c) is separated by ø, if 
o¢ U vecL(A) and ge L(AZ)!?. 

In particular, L(A”)—L(A’) is nonempty if and only if the pair ({q6}, qo‘) is 
separable. (Recall that gj and qọ° are the initial configurations of A’ and A”, 
respectively.) 

There is the following relationship between the separability of (A’, A”)-configura- 
tions and the partial order on the associated vectors. 


Lemma A.2. Let C,=(t,,C,,¢,) and C,=(t2,C ,¢2) be (A’, A”)-configurations such 
that Ve,<Ve,. If the pair (C,,c,) can be separated by a word of length n, then the pair 
(C2,c2) can also be separated by a word of length n. 


Proof. Let cı =(s,w 1,1... W1,,) and cz =(S, W2, 1... W2,,). (Notice that, by Remark A.1, 
the state component of c, is equal to that of c2.) Consider the bijection 
li iW 1s- W2, r} > {Wi 153W, r} defined by 14 (w2.)=Wi,,, K=1,2,...,7 By 
Remark A.1, i, is well defined. Moreover, 1,(t2)=1,. Using the inequality Ve, < Ve, we 
can extend 1, to an embedding 1) of Ze, O {W2,1,.--,W2,,} into Le, {wis Wir} 
such that 1,(C%/)<C?!, j=1,2,..., L. Indeed, by Remark A.1, we have the equality 
11(CEIO {Wa 1.0. Warp) = CVO {wi4,---,Wi,,}, and, since the cardinality of 
C$" does not exceed the cardinality of Cf’, Ch) —{w2,1,..., W2,,} can be embedded into 
CP {Wi asa Wi r) 


15 As in Sections 2 and 3, for a configuration c =(s, w) of a finite-memory automaton A = <S, qo, u, p, p, F 
we define a finite-memory automaton A, by 4. = <S, s, w, p, 4, F>. In particular, 4g, = A. 
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Let o,=0;,1...0;,, be a word of length n that separates (C,,c,). Let 
02 =07,1 ... 62, n, Where o2,;=1) '(61,;) if o,,; belongs to the range of 12, and o3,;=0;,;, 
if o,,; does not belong to the range of 12, i=1,2,...,n. We contend that ø, separates 
(C,,c2). To prove our contention we have to show that o2¢U¢ec,L(A,) and 
oE L(A‘). Let 13 be an automorphism of X such that 13(¢,) =o. (The existence of 13 is 
provided by the definition of e, in the beginning of this paragraph.) Then 13 coincides 
with 17 on the range of 13. The membership o,€L(A{,) immediately follows from 
Lemma 1 with :=1,. The proof of the nonmembership o2¢ U,<¢,L(A;) is also easy. If 
for some ce C2, 4,€L(A,), then we would have c EL(A,-1(.))- Since 13 1(c)EC,, this is 
impossible. O 


Lemma A.3. Let a positive integer N be such that for every n>N, for every word 
o=0,...0, of length n, and for every (A’, A")-run C,,C>,...,C, on o, the associated 
sequence Ve,, Vč, ..., Ve, contains two vectors Ve, and Vē, i<j, such that Ve,< Ve. If 
the difference L(A")— L(A’) is not empty, then it contains a word shorter than N. 


Proof. Let o=c, ...¢, be a word of the minimum length belonging to the difference 
L(A")— L(A’). We contend that n<N. To prove our contention we assume to the 
contrary that n>N. Let C,,C,...,C, be an (A’,A”)-run on o, C;=(0;,C;,c;), 
i=1,2,...,n, such that c,eF’*. Note that, since ¢ L(A’), C, 0 F" =6. Let i and j be as 
in the statement of the lemma. Obviously o;+, ... 6, separates (C;,c;). By Lemma A.2, 
there exists a word a’ of length n—j that separates (C;,c;). Since C4, C3, ... , C; is an 
(A', A” }run on o,02...0;, the word o” =o,... 0,0’ separates ({q5},q0°). Therefore 
o’e€L(A")— L(A’). The length of o” is n—j+i<n, in contradiction with the minimality 
assumption on the length ofa. O 


In order to show how to compute the constant N from Lemma A.3 we need one 
more auxiliary result. 


Lemma A.4, Let 1 be an automorphism of È that is an identity on [u'] O [u"] and let 
C1,C2,...,€, be an (A’, A”)-run on 6=6,02...6,. Then (C,),1(C2),...,(Cq) is an 
(4', A”)-run on i(o). 


Proof. Let C;=(¢;,C,c;), i=1,2,...,n. By the definition of the extension of 1 to 
(4', A”)-configurations, the first component of 1(C;) is 1(0;), i=1,2,...,n, and it 
follows from the proof of Lemma 1 that (1(c;),1(6)+1),U(ci+,))en’” and 
(Ci+1)= u (Ci), (054 1)), i=0,1,...,n— 1, where Co= {q9} and co =q.. 


Lemma A.S. Given finite-memory automata A’ and A" one can compute a positive 
integer N that satisfies the conditions of Lemma A.3. 


Proof. Consider a tree T whose nodes are the empty sequence and the sequences 


Ve, Ve,,--.,Ve,, where C,,C>,...,C, is an (4’, A”)-run on some word of length n. The 
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sequence Ve, ,, Ve, ,,..-, Ve,_,,. 18 a successor of the sequence Ve, ,, Ve,,,..., Vē, „o if 
and only ifn; =n; +1 and Ve, ,=Vc, ,, for i=1,2,...,2. We intend to prove that T is 
of a finite branching degree. Moreover, for a given positive integer n we show how to 
compute the part of T that consists of all vertices of depth not exceeding n. So, given n, 
let us fix n pairwise distinct symbols t,,72,...,7, not belonging to [w] o [u"], and let 
Y'={ti}i-1,....nU [u] o [u]. Let Ve,, Vē,- Vē, m<n, be a node of T that results 
from the (4’,A”)-run C,,C2,...,C, ON @=61,62,...,0m. There exists an automor- 
phism : of X that is an identity on [u] O [u"] such that :(¢) is a word over the alphabet 
2". Therefore, by Lemmas A.1 and A.4, the node Vē, V¢,,..., Vc, can also be obtained 
from an (4’, A”)-run on (ø). That is, for computing the set of all vertices of T of depth 
not exceeding n we may restrict ourselves to the (4’, A”)-runs on all words over a finite 
alphabet 2’. The number of words over 2’ of length not exceeding n is finite, and for 
each such word all possible (4’, A” }-runs on it are computable. Also for any two nodes 
of T it is decidable whether one is a successor of the other. This completes the 
description of the algorithm for computing the part of T consisting of all vertices of 
depth not exceeding n. 

Using the above algorithm, for each positive integer N we can check whether each 
node Ve, Ve,,-.., Ve, of depth N contains two vectors Ve, and Vē, i<j, such that 
Ve, < Vc. Now, if a positive integer N satisfying the conditions of Lemma 4 exists, 
then it can be found by checking all nodes of depth 1, then all nodes of depth 2, etc. 
(This process must eventually terminate when we arrive at the right N.) 

To complete the proof we have to show that there indeed exists a positive integer 
N satisfying the conditions of Lemma A.3. Assume to the contrary that there is no 
such N. That is, for each N = 1, 2,... there exists a node Ve, , Ve, ,,-... Ve, such that 
for each i and j, i<j<N, Ve, £ Vč, Since the number of such nodes is infinite and 
T is of a finite branching degree, it follows from the König Infinitary Lemma that 
there exists an infinite path in T that contains infinitely many nodes 
Vey is Vey o> ++» Vey, Such that for each i and j, i<j<N, Ve, €Ve, ,. These nodes, 
being on the same path, constitute an infinite subset of prefixes of some infinite 
sequence Vg, V¢,,..., Ve,... Therefore the above infinite sequence contains no two 
vectors Ve, and Vē, i<j, such that Ve,< Ve. This contradicts [4, Lemma 4.1] stating 
that exactly the opposite holds. C 


Now, using Lemmas A.3 and A.5, we can implement the algorithm presented in the 
beginning of the appendix. 
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